Document Image Applications
نویسنده
چکیده
The analysis of document images is a difficult and ill-defined task. Unlike the graphics operation of rendering a document into a pixmap, using a structured page-level description such as pdf, the analysis problem starts with the pixmap and attempts to generate a structured description. This description is hierarchical, and typically consists of two interleaved trees, one giving the physical layout of the elements and the other affixing semantic tags. Tag assignment is ambiguous unless the rules determining structure and rendering are tightly constrained and known in advance. Although the graphical rendering process invariably loses structural information, much useful information can be extracted from the pixmaps. Some of that information, such as skew, warp and text orientation detection, is related to the digitization process and is useful for improving the rendering on a screen or paper. The layout hierarchy can be used to reflow the text for small displays or magnified printing. Other information is useful for organizing the information in an index, or for compressing the image data. This chapter is concerned with robust and efficient methods for extracting such useful data. What representation(s) should be used for image analysis? Empirically, a very large set of document image analysis (DIA) problems can be accurately and efficiently addressed with image morphology and related image processing methods. When the image is used as the fundamental representation, and analysis (decisions) are based on nonlinear image operations, many benefits accrue: (1) analysis is very fast, especially if carried out at relevant image scales; (2) analysis retains the image geometry, so that processing errors are obvious, the accuracy of results is visually evident, and the operations are easily improved; (3) alignment between different renderings and resolutions is maintained; (4) pixel labelling is made in parallel by neighbors; (5) sequential (e.g., filling) operations are used where pixels can have arbitrarily long-range effects; (6) pixel groupings are easily determined; (7) segmentation output is naturally represented using masks; (8) implementation is simplified because only a relatively small number of imaging operations must be implemented efficiently; (9) applications can use both shape and texture, at multiple resolutions, to label pixels; and (10) the statistical properties of pixels and sets of pixels can be used to make robust estimation. Table 1 depicts document image analysis (DIA) as occupying a high to intermediate position in terms of constraints, which depend on the accuracy of the statistical models representing the collection of images. Bayesian statistical models are the most constrained. Analysis is performed by generation from the models, using maximum a posteriori (MAP) inference. These techniques have been used for OCR [7] and for locating textlines [6], and can be implemented efficiently using heuristics despite the fact that they require matching all templates at all possible locations [9].
منابع مشابه
Document Image Retrieval Based on Keyword Spotting Using Relevance Feedback
Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...
متن کاملLearning Document Image Features With SqueezeNet Convolutional Neural Network
The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...
متن کاملConnected Component Based Word Spotting on Persian Handwritten image documents
Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...
متن کاملDocument Image Dewarping Based on Text Line Detection and Surface Modeling (RESEARCH NOTE)
Document images produced by scanner or digital camera, usually suffer from geometric and photometric distortions. Both of them deteriorate the performance of OCR systems. In this paper, we present a novel method to compensate for undesirable geometric distortions aiming to improve OCR results. Our methodology is based on finding text lines by dynamic local connectivity map and then applying a l...
متن کاملDocument Analysis And Classification Based On Passing Window
In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...
متن کاملPersian Printed Document Analysis and Page Segmentation
This paper presents, a hybrid method, low-resolution and high-resolution, for Persian page segmentation. In the low-resolution page segmentation, a pyramidal image structure is constructed for multiscale analysis and segments document image to a set of regions. By high-resolution page segmentation, by connected components analysis, each region is segmented to homogeneous regions and identifyi...
متن کامل